Search CORE

59 research outputs found

Localizing by Describing: Attribute-Guided Attention Localization for Fine-Grained Recognition

Author: Ding Errui
Lin Yuanqing
Liu Xiao
Wang Jiang
Wen Shilei
Publication venue
Publication date: 22/05/2016
Field of study

A key challenge in fine-grained recognition is how to find and represent discriminative local regions. Recent attention models are capable of learning discriminative region localizers only from category labels with reinforcement learning. However, not utilizing any explicit part information, they are not able to accurately find multiple distinctive regions. In this work, we introduce an attribute-guided attention localization scheme where the local region localizers are learned under the guidance of part attribute descriptions. By designing a novel reward strategy, we are able to learn to locate regions that are spatially and semantically distinctive with reinforcement learning algorithm. The attribute labeling requirement of the scheme is more amenable than the accurate part location annotation required by traditional part-based fine-grained recognition methods. Experimental results on the CUB-200-2011 dataset demonstrate the superiority of the proposed scheme on both fine-grained recognition and attribute recognition

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

WordSup: Exploiting Word Annotations for Character based Text Detection

Author: Ding Errui
Han Junyu
Hu Han
Luo Yuxuan
Wang Yuzhuo
Zhang Chengquan
Publication venue
Publication date: 22/08/2017
Field of study

Imagery texts are usually organized as a hierarchy of several visual elements, i.e. characters, words, text lines and text blocks. Among these elements, character is the most basic one for various languages such as Western, Chinese, Japanese, mathematical expression and etc. It is natural and convenient to construct a common text detection engine based on character detectors. However, training character detectors requires a vast of location annotated characters, which are expensive to obtain. Actually, the existing real text datasets are mostly annotated in word or line level. To remedy this dilemma, we propose a weakly supervised framework that can utilize word annotations, either in tight quadrangles or the more loose bounding boxes, for character detector training. When applied in scene text detection, we are thus able to train a robust character detector by exploiting word annotations in the rich large-scale real scene text datasets, e.g. ICDAR15 and COCO-text. The character detector acts as a key role in the pipeline of our text detection engine. It achieves the state-of-the-art performance on several challenging scene text detection benchmarks. We also demonstrate the flexibility of our pipeline by various scenarios, including deformed text detection and math expression recognition.Comment: 2017 International Conference on Computer Visio

arXiv.org e-Print Archive

Crossref